Cache oblivious storage and access heuristics for blocked matrix-matrix multiplication
نویسندگان
چکیده
We investigate effects of ordering in blocked matrix–matrix multiplication. We find that submatrices do not have to be stored contiguously in memory to achieve near optimal performance. Instead it is the choice of execution order of the submatrix multiplications that leads to a speedup of up to four times for small block sizes. This is in contrast to results for single matrix elements showing that contiguous memory allocation quickly becomes irrelevant as the blocksize increases.
منابع مشابه
Cache oblivious matrix multiplication using an element ordering based on the Peano curve
One of the keys to tap the full performance potential of current hardware is the optimal utilisation of cache memory. Cache oblivious algorithms are designed to inherently benefit from any underlying hierarchy of caches, but do not need to know about the exact structure of the cache. In this paper, we present a cache oblivious algorithm for matrix multiplication. The algorithm uses a block recu...
متن کاملCommunication - Minimizing Algorithms for Matrix Multiplication
As computers increase in speed, the proportion of time spent on communication between cache and hard drive or between multiple processors continues to rise. For single processors, data must be moved between the processor’s fast-access cache and main memory, an operation that often takes many orders of magnitude longer than any arithmetic operation. When multiple levels of cache are present, a c...
متن کاملA cache-oblivious sparse matrix–vector multiplication scheme based on the Hilbert curve
The sparse matrix–vector (SpMV) multiplication is an important kernel in many applications. When the sparse matrix used is unstructured, however, standard SpMV multiplication implementations typically are inefficient in terms of cache usage, sometimes working at only a fraction of peak performance. Cache-aware algorithms take information on specifics of the cache architecture as a parameter to ...
متن کاملTwo-dimensional cache-oblivious sparse matrix-vector multiplication
In earlier work, we presented a one-dimensional cache-oblivious sparse matrix–vector (SpMV) multiplication scheme which has its roots in one-dimensional sparse matrix partitioning. Partitioning is often used in distributed-memory parallel computing for the SpMV multiplication, an important kernel in many applications. A logical extension is to move towards using a two-dimensional partitioning. ...
متن کاملCache-Oblivious Sparse Matrix--Vector Multiplication by Using Sparse Matrix Partitioning Methods
In this article, we introduce a cache-oblivious method for sparse matrix vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a hypergraph-based sparse matrix partitioning scheme so that the resulting matrix induces cache-friendly behaviour during sparse matrix vector multiplication. Matrices are assumed to be stored in row-major format, by means ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/0808.1108 شماره
صفحات -
تاریخ انتشار 2008